Round 1: Data Modeling & SQL
🔹Write a SQL query to fetch the top 3 highest-earning employees from each department.
🔹How would you design a data model for an e-commerce application (customers, products, orders)?
🔹Explain normalization and denormalization—when would you use each?
🔹What is a composite primary key, and in which scenario would you use it?
🔹How do you handle performance tuning in SQL queries?
🔹How would you implement slowly changing dimensions (SCDs) in a data warehouse?
Round 2: Big Data & Distributed Systems
🔹How would you design a data pipeline to process 1 TB of data daily in real-time?
🔹Explain the differences between Hadoop, Spark, and Flink. Which one would you choose for real-time data processing and why?
🔹How do you optimize data storage for large-scale datasets on AWS S3?
🔹Explain partitioning in Hive and how it improves query performance.
🔹How would you process a huge dataset using AWS Glue or EMR?
Round 3: Big Data & Distributed Systems
🔹Describe your experience with AWS Redshift and how you optimized query performance.
🔹How would you architect a scalable ETL pipeline using AWS Lambda and Step Functions?
🔹Explain how you would handle security and data governance in an AWS data lake setup.
🔹Discuss your experience with AWS Glue, Redshift, and S3. What are the best practices for optimizing storage and retrieval?
🔹How would you implement real-time data ingestion and processing using Kinesis or Kafka?
Round 4 - Hiring Manager
🔹 Discussion around my past experience and projects, some resume based questions
🔹 He wanted to know about my good and bad experiences with past employers
🔹 How will you work in a team for tight project delivery timelines?
🔹 What are you expecting in your next job role?